Invalidate `PackedInstruction` cache upon modification #13543

raynelfss · 2024-12-10T12:23:45Z

Summary

This should supercede #13532, and should correctly fix #13504
A previous mis-implementation allowed shared references between cloned instances of PackedInstruction when it came to their Python counterparts. The following commits re-implement the Clone trait and makes PackedInstruction attributes accessible only through views. This is done to allow the cached PyGate to be discarded upon modification of op, params or extra_attrs attributes of said instance of PackedInstruction.

Details and comments

As referenced by @Cryoris in #13504, we were finding unknown parameters when performing translations in parallel. This was due to modifications coming from BasisTranslator helper function compose_transforms which would modify certain parameters from a circuit. The reason for this was that we had been unintentionally copying references to the cached python gates stored within these operation instances.

The following commits perform a couple of changes on the PackedInstruction class to prevent this from happening:

Implement the Clone trait for PackedInstruction such that the cached py_gate reference is discarded.
Control any modifications of the cached_pygate by discarding the cache whenever these attributes are borrowed mutably. This was done by:
- Making all attributes of the PackedInstruction private.
- Create getters and setters (more accurately mutable getters) that control when it is proper to discard the cached gate.

These changes had a propagated effect in the codebase and a lot of other pieces of code had to be modified for it to work with it.

Notes:

There are a couple of performance regressions related to circuit construction time. This is a tradeoff from having to rebuild the cached gate many more times than it was required before.

Co-authored-by: Julien Gacon gaconju@gmail.com

- Make all attributes of the `PackedInstruction` private to allow for control of the underlying cached `PyObject` reference stored in Python. This ensures that no dead references are left as soon as the gate is modified from the Rust side. - Reimplement `Clone` trait for `PackedInstruction`s to clear the cached gate whenever we decide make calls to `clone()`. - Re-organize rest of the code to adapt to the new changes.

crates/circuit/src/packed_instruction.rs

Cryoris · 2024-12-10T12:42:12Z

While I'm not sure how to test the original bug in CI (since we turn Qiskit in parallel off) we can test @jakelishman's reproducer from: #13532 (comment) 🙂

coveralls · 2024-12-10T12:50:01Z

Pull Request Test Coverage Report for Build 12280844236

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

628 of 644 (97.52%) changed or added relevant lines in 24 files are covered.
18 unchanged lines in 6 files lost coverage.
Overall coverage increased (+0.03%) to 88.966%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/commutation_cancellation.rs	6	7	85.71%
crates/circuit/src/packed_instruction.rs	84	87	96.55%
crates/accelerate/src/basis/basis_translator/mod.rs	76	82	92.68%
crates/circuit/src/dag_circuit.rs	209	215	97.21%

Files with Coverage Reduction	New Missed Lines	%
crates/accelerate/src/unitary_synthesis.rs	1	92.14%
crates/accelerate/src/basis/basis_translator/compose_transforms.rs	1	97.44%
crates/circuit/src/interner.rs	3	88.52%
crates/qasm2/src/lex.rs	3	92.23%
crates/accelerate/src/basis/basis_translator/mod.rs	4	89.42%
crates/circuit/src/circuit_data.rs	6	94.82%

Totals
Change from base Build 12236482718:	0.03%
Covered Lines:	79388
Relevant Lines:	89234

💛 - Coveralls

raynelfss · 2024-12-10T14:35:48Z

These changes do fix things but there is a big performance regression in circuit construction. Since I removed a lot of the pre-caching that was being done prior to introducing these changes, I thought adding those back would fix the performance regression, however, local testing proved that including those made things even slower. I will post those numbers later, but slowdowns were as high as 2x slower. This is the current regression:

Change	Before [`841cb29`]	After [`b337cf3`]	Ratio	Benchmark (Parameter)
+	60.9±2μs	93.0±4μs	1.53	circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 8192)
+	215±4μs	315±6μs	1.47	circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 32768)
+	216±9μs	314±9μs	1.46	circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 32768)
+	838±30μs	1.21±0.01ms	1.44	circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 131072)
+	60.9±1μs	87.3±2μs	1.43	circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 8192)
+	223±10μs	315±10μs	1.41	circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 32768)
+	215±6μs	301±8μs	1.4	circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 32768)
+	61.4±1μs	85.5±7μs	1.39	circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 8192)
+	1.32±0.02ms	1.80±0.06ms	1.37	circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 131072)
+	220±6μs	300±8μs	1.36	circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 32768)
+	19.3±0.4μs	26.1±0.5μs	1.35	circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 2048)
+	223±6μs	301±10μs	1.35	circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 32768)
+	20.6±0.4μs	27.9±1μs	1.35	circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 2048)
+	22.1±0.9μs	29.8±1μs	1.35	circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 2048)
+	60.0±3μs	79.3±4μs	1.32	circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 8192)
+	1.33±0.02ms	1.75±0.09ms	1.32	circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 131072)
+	61.0±2μs	80.4±4μs	1.32	circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 8192)
+	64.7±1μs	84.5±0.9μs	1.31	circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 8192)
+	24.0±0.8μs	31.0±2μs	1.29	circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 2048)
+	20.9±0.6μs	27.0±0.7μs	1.29	circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 2048)
+	1.44±0.08ms	1.83±0.07ms	1.27	circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 131072)
+	1.34±0.02ms	1.67±0.03ms	1.25	circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 131072)
+	26.0±0.5μs	32.0±0.7μs	1.23	circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 2048)
+	1.37±0.06ms	1.68±0.1ms	1.23	circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 131072)
+	5.92±0.07ms	6.90±0.3ms	1.16	circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(100, 10)
+	7.73±0.06ms	8.56±0.2ms	1.11	circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(50, 50)
+	6.34±0.09μs	6.98±0.1μs	1.1	circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 128)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

With this comes the question of whether we should continue to explore ways of optimizing this and, were we not to find a way of optimizing, should we choose having a more stable solution in exchange of a potentially slower circuit builder?

raynelfss · 2024-12-10T14:36:37Z

While I'm not sure how to test the original bug in CI (since we turn Qiskit in parallel off) we can test @jakelishman's reproducer from: #13532 (comment) 🙂

It seems to work well with this based on local tests.

raynelfss · 2024-12-10T17:16:10Z

Other than the circuit instruction benchmarks, not much seems to have changed in the utility scale, although a couple benchmarks seem to have slowed down.

Utility scale benchmarks:

Before [`841cb29`]	After [`b337cf3`]	Ratio	Benchmark (Parameter)
0	0	n/a	utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')
0	0	n/a	utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')
0	0	n/a	utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')
328±8ms	343±10ms	1.05	utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')
345±8ms	357±4ms	1.03	utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')
500±10ms	513±10ms	1.03	utility_scale.UtilityScaleBenchmarks.time_qft('cx')
7.12±0.1ms	7.24±0.2ms	1.02	utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')
2.77±0.01s	2.82±0.02s	1.02	utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')
2.77±0.04s	2.79±0.05s	1.01	utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')
4.86±0.2ms	4.93±0.1ms	1.01	utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')
51.7±0.8ms	52.1±0.9ms	1.01	utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')
51.5±1ms	52.2±0.6ms	1.01	utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')
16.5±0.4ms	16.6±0.1ms	1.01	utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')
279±3ms	283±4ms	1.01	utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')
608±20ms	614±20ms	1.01	utility_scale.UtilityScaleBenchmarks.time_qft('cz')
103±1ms	103±2ms	1.00	utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')
2.79±0.03s	2.78±0.01s	1.00	utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')
51.6±0.8ms	51.6±1ms	1.00	utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')
16.6±0.5ms	16.6±0.4ms	1.00	utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')
16.4±0.4ms	16.5±0.4ms	1.00	utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')
661±20ms	659±10ms	1.00	utility_scale.UtilityScaleBenchmarks.time_qv('cx')
733±20ms	737±20ms	1.00	utility_scale.UtilityScaleBenchmarks.time_qv('ecr')
390	390	1.00	utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')
397	397	1.00	utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')
397	397	1.00	utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')
300	300	1.00	utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')
300	300	1.00	utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')
300	300	1.00	utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')
1607	1607	1.00	utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')
1622	1622	1.00	utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')
1622	1622	1.00	utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')
1954	1954	1.00	utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')
1954	1954	1.00	utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')
1954	1954	1.00	utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')
2709	2709	1.00	utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')
2709	2709	1.00	utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')
2709	2709	1.00	utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')
462	462	1.00	utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')
462	462	1.00	utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')
462	462	1.00	utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')
98.2±3ms	97.7±1ms	0.99	utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')
103±1ms	102±2ms	0.99	utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')
7.20±0.2ms	7.13±0.2ms	0.99	utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')
4.90±0.1ms	4.87±0.08ms	0.99	utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')
125±3ms	124±1ms	0.99	utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')
124±3ms	123±2ms	0.99	utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')
5.00±0.04ms	4.89±0.07ms	0.98	utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')
611±20ms	602±10ms	0.98	utility_scale.UtilityScaleBenchmarks.time_qft('ecr')
760±20ms	739±3ms	0.97	utility_scale.UtilityScaleBenchmarks.time_qv('cz')
7.48±0.1ms	7.19±0.2ms	0.96	utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')
115±3ms	110±2ms	0.96	utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Transpiler levels

Change	Before [`841cb29`]	After [`b337cf3`] <rebalance-packed-instructions~1>	Ratio	Benchmark (Parameter)
+	40.3±2ms	49.5±0.2ms	1.23	transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)
+	40.5±1ms	49.2±0.2ms	1.21	transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)
-	12.3±0.2ms	11.1±0.2ms	0.9	transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

- Add views for `py_op`. - Remove stale saching being done in `apply_operation_*` in `DAGCircuit`. - Adapt the code to new changes. - Add missing docstring

raynelfss · 2024-12-10T17:49:34Z

There seems to be a strange effect in which some benchmarks improve while others get worse, which is strange:

Change	Before [`841cb29`]	After [590b4a3a]	Ratio	Benchmark (Parameter)
+	16.7±0.5ms	22.4±0.3ms	1.35	transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(2, 'sabre', 'dense')
+	38.5±0.2ms	50.2±0.5ms	1.3	transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)
+	40.9±0.9ms	49.6±0.2ms	1.21	transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)
-	24.0±0.5ms	16.5±0.4ms	0.69	transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(3, 'sabre', 'sabre')
-	23.3±0.2ms	15.7±0.1ms	0.67	transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(3, 'stochastic', 'dense')
-	24.4±0.5ms	15.8±0.2ms	0.65	transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(3, 'sabre', 'dense')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

qiskit-bot · 2024-12-10T19:23:06Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

kevinhartman

This looks good so far, but I've noted a few concerns inline.

Specifically, it isn't entirely clear to me when we can preserve the py_op, if at all. In its current state, it looks like this PR always throws it out, even when building new instances of PackedInstruction via ::new, and even when cloning a singleton gate. If we're doing so unnecessarily, that is certainly performance we're leaving (/putting back) on the table. Do you know if that's the case @raynelfss?

It'd be great if @jakelishman or @mtreinish can confirm prior to the potential inclusion of this PR in 1.3.1 this week, though they're both on holiday.

kevinhartman · 2024-12-10T20:43:03Z

crates/circuit/src/circuit_data.rs

            #[cfg(feature = "cache_pygates")]
            {
-                inst.py_op = py_op.unbind().into();
+                *inst.py_op_mut() = py_op.unbind().into();


If we consider constructing a new PackedInstruction with a user-provided Python op for the cache as unsafe, then we should also consider this to be unsafe, correct? Since the instance we get by calling func is something the user may have a reference to.

I agree with your reasoning, I'm okay with making the py_op attribute fully invisible since it is still unsafe (even when immutably borrowed).

kevinhartman · 2024-12-10T20:51:26Z

crates/circuit/src/dag_circuit.rs

-                params: (!py_op.params.is_empty()).then(|| Box::new(py_op.params)),
-                extra_attrs: py_op.extra_attrs,
-                #[cfg(feature = "cache_pygates")]
-                py_op: op.unbind().into(),


Do we actually need to throw away the py_op when creating new PackedInstruction instances?

Surely we anticipated that it would be an issue for Python-side code to modify the cached objects at the time we introduced the py_op cache. That makes me think we should preserve py_op as part of PackedOperation::new. Isn't the issue only via the default Clone for PackedInstruction?

I realize that @jakelishman is on holiday, but perhaps if he sees this he can clarify.

Yes, but also think of it this way.

If you look at the PackedInstruction::unpack_py_op() method, in which a reference is built to a python object (basically what we do when the cache_pygate flag is disabled), in the case of a standard gate, we build a new instance of py_op based on the current parameters or the extra attributes of our gate, and not only that, but the instance we get from this function whenever dealing with any other type of gate comes directly from the OperationRef type if it contains a python object within and isn't really affected by changes unless our op type is StandardGate.

Why do I mention this? Well, whenever a gate is initialized without a cached pygate, this is the function in charge of building that gate for us. Since this gate is disconnected and changes won't propagate, we'd need to update it by creating a new one whenever our params or attributes are modified (or at least that's what I understand from here).

We could technically optimize further by only re-building if we have an instance of Standard gate, since what we do in other cases is a call to clone_ref(py). What do you think?

Python doesn't enforce ownership in the way Rust does, but we can still think of it in similar terms. DAGCircuit.apply_operation_back has always behaved as if calling it transferred ownership of the gate object, as opposed to QuantumCircuit.append, which copied the object if it detected that you might want to modify it (i.e. it was parametric).

Since the Python object coming in is logically "owned" by the function now, I think it's fine to put it in the cache.

With all the Python stuff, we can't compile-time (or even, in most cases, run-time) forbid modifications to Python objects, so there's always going to be the possibility for a user to break us a bit, and some of our Python-based will be pragmatics. I think in this case, saving the incoming object in the cache is no more dangerous than having the cache at all, so it should be fine to keep.

kevinhartman · 2024-12-10T20:57:10Z

crates/circuit/src/packed_instruction.rs

+    // Setters
+
+    /// Retrieves an immutable reference to the instruction's underlying operation.
+    pub fn op_mut(&mut self) -> &mut PackedOperation {


It looks to me that all of these _mut methods are only used within the qiskit-circuit crate. Should we introduce these with visibility pub(crate) instead?

Originally PackedInstruction was more of a Rust POD, but now it feels like we should lock down its internals where possible since it is managing its own cache invalidation.

Sure, this makes more sense

I think making the methods pub(crate) is a bit needlessly restrictive. Anywhere you can get a &mut PackedInstruction, you can effectively change any fields you need to, because you can *instruction = my_new_packed_instruction, so the actual external access restriction for the circuit needs to be purely managed by when you can get a &mut PackedInstruction.

Otherwise, it's just needlessly restrictive - if you can create a new PackedInstruction with pub visibility, then making these methods pub(crate) doesn't actually restrict any capability.

crates/circuit/src/packed_instruction.rs

kevinhartman · 2024-12-10T21:15:52Z

crates/circuit/src/packed_instruction.rs

+            params: self.params.clone(),
+            extra_attrs: self.extra_attrs.clone(),
+            #[cfg(feature = "cache_pygates")]
+            py_op: OnceLock::new(),


Toward the beginning of the discussion here, @jakelishman says:

the right solution is to manually impl Clone for PackedInstruction to not propagate the cache for gates that aren't Python-space singletons.

Should we be preserving the py_op in the case of singleton gates to avoid a more significant performance regression?

How would we be able to identify which ones are singletons?

I was talking mostly about what's necessary, rather than what's easy. Technically it's safe to propagate the cache if the Python object itself is immutable. If it's too hard to tell, or we deem it too risky, we can just zero them all out.

Fwiw, we'd basically just apply a Rust-only heuristic: check the PackedOperation is a Standard and it's a 0-parameter gate with extra_attrs == None. That won't catch every single singleton, but it'll do the vast majority and fails safe.

raynelfss · 2024-12-11T14:57:50Z

@kevinhartman I'd honestly be okay with not keeping an initialized py_op in any case whatsoever. We can have it be a cached instance that's only built when needed and discard it anytime we modify a parameter, op, or extra_attrs. I feel this makes much more sense as it would produce more stable results. I wonder how the rest of the team feels about this.

As for what happens if we bring back the deleted assigned py_op instances, the performance is actually way worse (though I'm not quite sure why that is). I made a divergent branch not too long ago in which I added py_op as an argument for PackedInstruction::new() and it resulted in an even greater performance regression during circuit construction:

Change	Before [`84def52`]	After [`d692313`]	Ratio	Benchmark (Parameter)
+	121±10μs	187±40μs	1.54	circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 8192)
+	110±2μs	138±10μs	1.25	circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 8192)
+	373±20μs	454±20μs	1.22	circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 32768)
+	114±3μs	133±0.6μs	1.17	circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 8192)
+	112±4μs	131±3μs	1.17	circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 8192)
+	33.8±2μs	38.8±0.6μs	1.15	circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 2048)
+	37.4±1μs	43.0±2μs	1.15	circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 2048)
+	3.19±0.05ms	3.56±0.03ms	1.12	circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 131072)
+	3.22±0.01ms	3.57±0.03ms	1.11	circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 131072)
+	3.15±0.03ms	3.50±0.01ms	1.11	circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 131072)
+	3.21±0.05ms	3.54±0.03ms	1.1	circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 131072)
+	3.18±0.03ms	3.51±0.04ms	1.1	circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 131072)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

It's very puzzling as to why this is happening, but if we could make the attribute fully private to the PackedInstruction instance and have it only generate it when needed, it would make a lot more sense to keep this around.

releasenotes/notes/fix-packed-instruction-f2620d5aff61269d.yaml

Co-authored-by: Kevin Hartman <kevin@hart.mn>

jakelishman · 2024-12-11T16:52:22Z

I'm very surprised to see performance dropping if some of the caches are being propagated but not all - it might be a good idea to try and dig heavily into that and really understand why it's happened, especially since this is being considered for a bugfix.

Fwiw, this feels like a very major change to be targetted at a bugfix to me, especially a release that might go out tomorrow. It might worth considering squashing the BasisTranslator problem in a less ideal way for the bugfix release (like just having assign_parameters in Rust space clear the pyop cache rather than attempting to mutate the Python object) that we know is safe and doesn't have the same not-quite-understood performance drawbacks. But of course, I'm still on leave and don't have anywhere near the full context for anything going on here, so you guys should actually make the decision, not me.

kevinhartman · 2024-12-11T17:37:03Z

Fwiw, this feels like a very major change to be targetted at a bugfix to me, especially a release that might go out tomorrow. It might worth considering squashing the BasisTranslator problem in a less ideal way for the bugfix release (like just having assign_parameters in Rust space clear the pyop cache rather than attempting to mutate the Python object) that we know is safe and doesn't have the same not-quite-understood performance drawbacks. But of course, I'm still on leave and don't have anywhere near the full context for anything going on here, so you guys should actually make the decision, not me.

We're all in agreement that this PR should not merge until after 1.3.1. @raynelfss is working on a separate more localized hotfix for #13504 for this release, and we'll revisit this one in the future when we have time to understand the performance characteristics.

raynelfss added 2 commits December 8, 2024 17:10

Fix: Do not discard cache when modifying qubits and clbits

b337cf3

raynelfss changed the title ~~[WIP] Invalidate PackedInstruction cached upon modification~~ [WIP] Invalidate PackedInstruction cache upon modification Dec 10, 2024

Cryoris reviewed Dec 10, 2024

View reviewed changes

crates/circuit/src/packed_instruction.rs Show resolved Hide resolved

crates/circuit/src/packed_instruction.rs Show resolved Hide resolved

crates/circuit/src/packed_instruction.rs Show resolved Hide resolved

crates/circuit/src/packed_instruction.rs Show resolved Hide resolved

kevinhartman added this to the 1.3.1 milestone Dec 10, 2024

Fix: Make py_op private.

473ca03

- Add views for `py_op`. - Remove stale saching being done in `apply_operation_*` in `DAGCircuit`. - Adapt the code to new changes. - Add missing docstring

raynelfss added bug Something isn't working Changelog: Bugfix Include in the "Fixed" section of the changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library priority: high labels Dec 10, 2024

Docs: Add release note

84def52

raynelfss marked this pull request as ready for review December 10, 2024 19:23

raynelfss requested a review from a team as a code owner December 10, 2024 19:23

raynelfss changed the title ~~[WIP] Invalidate PackedInstruction cache upon modification~~ Invalidate PackedInstruction cache upon modification Dec 10, 2024

raynelfss mentioned this pull request Dec 10, 2024

Copy circuits and parameters when adding them to the EquivalenceLibrary in Rust. #13532

Closed

1 task

raynelfss added the stable backport potential The bug might be minimal and/or import enough to be port to stable label Dec 10, 2024

kevinhartman reviewed Dec 10, 2024

View reviewed changes

alexanderivrii reviewed Dec 11, 2024

View reviewed changes

releasenotes/notes/fix-packed-instruction-f2620d5aff61269d.yaml Outdated Show resolved Hide resolved

raynelfss and others added 2 commits December 11, 2024 11:29

Apply suggestions from code review

7a6f5f1

Co-authored-by: Kevin Hartman <kevin@hart.mn>

Fix: Docstrings and "no no" in release note

64b1644

kevinhartman modified the milestones: 1.3.1, 1.3.2 Dec 11, 2024

raynelfss added the on hold Can not fix yet label Dec 11, 2024

raynelfss mentioned this pull request Dec 11, 2024

Discard cache for standard gates in assign_parameters #13557

Merged

mergify bot mentioned this pull request Dec 12, 2024

Discard cache for standard gates in assign_parameters (backport #13557) #13559

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalidate `PackedInstruction` cache upon modification #13543

Invalidate `PackedInstruction` cache upon modification #13543

raynelfss commented Dec 10, 2024 •

edited

Loading

Cryoris commented Dec 10, 2024

coveralls commented Dec 10, 2024 •

edited

Loading

raynelfss commented Dec 10, 2024

raynelfss commented Dec 10, 2024

raynelfss commented Dec 10, 2024

raynelfss commented Dec 10, 2024

qiskit-bot commented Dec 10, 2024

kevinhartman left a comment

kevinhartman Dec 10, 2024

raynelfss Dec 11, 2024

kevinhartman Dec 10, 2024

raynelfss Dec 11, 2024 •

edited

Loading

jakelishman Dec 11, 2024

kevinhartman Dec 10, 2024

raynelfss Dec 11, 2024

jakelishman Dec 11, 2024

kevinhartman Dec 10, 2024

raynelfss Dec 11, 2024

jakelishman Dec 11, 2024

raynelfss commented Dec 11, 2024

jakelishman commented Dec 11, 2024

kevinhartman commented Dec 11, 2024

Invalidate PackedInstruction cache upon modification #13543

Are you sure you want to change the base?

Invalidate PackedInstruction cache upon modification #13543

Conversation

raynelfss commented Dec 10, 2024 • edited Loading

Summary

Details and comments

Notes:

Cryoris commented Dec 10, 2024

coveralls commented Dec 10, 2024 • edited Loading

Pull Request Test Coverage Report for Build 12280844236

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

raynelfss commented Dec 10, 2024

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. PERFORMANCE DECREASED.

raynelfss commented Dec 10, 2024

raynelfss commented Dec 10, 2024

Utility scale benchmarks:

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Transpiler levels

raynelfss commented Dec 10, 2024

qiskit-bot commented Dec 10, 2024

kevinhartman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raynelfss Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raynelfss commented Dec 11, 2024

jakelishman commented Dec 11, 2024

kevinhartman commented Dec 11, 2024

Invalidate `PackedInstruction` cache upon modification #13543

Invalidate `PackedInstruction` cache upon modification #13543

raynelfss commented Dec 10, 2024 •

edited

Loading

coveralls commented Dec 10, 2024 •

edited

Loading

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

raynelfss Dec 11, 2024 •

edited

Loading